Mesopotamia
Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources
Ticona, Belu, Carranza, Fernando, Cotik, Viviana
Argentina has a large yet little-known Indigenous linguistic diversity, encompassing at least 40 different languages. The majority of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, unified information on speakers and computational tools is lacking for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, classifying them into seven language families: Mapuche, Tup\'i-Guaran\'i, Guaycur\'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. For each one, we present an estimation of the national Indigenous population size, based on the most recent Argentinian census. We discuss potential reasons why the census questionnaire design may underestimate the actual number of speakers. We also provide a concise survey of computational resources available for these languages, whether or not they were specifically developed for Argentinian varieties.
- South America > Paraguay (0.15)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
- (26 more...)
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Park, Yein, Yoon, Chanwoong, Park, Jungwoo, Lee, Donghyeon, Jeong, Minbyul, Kang, Jaewoo
Large language models (LLMs) have brought significant changes to many aspects of our lives. However, assessing and ensuring their chronological knowledge remains challenging. Existing approaches fall short in addressing the temporal adaptability of knowledge, often relying on a fixed time-point view. To overcome this, we introduce ChroKnowBench, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our benchmark distinguishes between knowledge that evolves (e.g., personal history, scientific discoveries, amended laws) and knowledge that remain constant (e.g., mathematical truths, commonsense facts). Building on this benchmark, we present ChroKnowledge (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating LLMs' non-parametric chronological knowledge. Our evaluation led to the following observations: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on. (2) LLMs partially recall knowledge or show a cut-off at temporal boundaries rather than recalling all aspects of knowledge correctly. Thus, we apply ourChroKnowPrompt, an in-depth prompting to elicit chronological knowledge by traversing step-by-step through the surrounding time spans. We observe that it successfully recalls objects across both open-source and proprietary LLMs, demonstrating versatility, though it faces challenges with dynamic datasets and unstructured formats.
- Leisure & Entertainment > Sports > Soccer (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Mining Path Association Rules in Large Property Graphs (with Appendix)
Sasaki, Yuya, Karras, Panagiotis
How can we mine frequent path regularities from a graph with edge labels and vertex attributes? The task of association rule mining successfully discovers regular patterns in item sets and substructures. Still, to our best knowledge, this concept has not yet been extended to path patterns in large property graphs. In this paper, we introduce the problem of path association rule mining (PARM). Applied to any \emph{reachability path} between two vertices within a large graph, PARM discovers regular ways in which path patterns, identified by vertex attributes and edge labels, co-occur with each other. We develop an efficient and scalable algorithm PIONEER that exploits an anti-monotonicity property to effectively prune the search space. Further, we devise approximation techniques and employ parallelization to achieve scalable path association rule mining. Our experimental study using real-world graph data verifies the significance of path association rules and the efficiency of our solutions.
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- South America > Argentina > Mesopotamia > Misiones Province (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (5 more...)
Online machine-learning forecast uncertainty estimation for sequential data assimilation
Sacco, Maximiliano A., Pulido, Manuel, Ruiz, Juan J., Tandeo, Pierre
Quantifying forecast uncertainty is a key aspect of state-of-the-art numerical weather prediction and data assimilation systems. Ensemble-based data assimilation systems incorporate state-dependent uncertainty quantification based on multiple model integrations. However, this approach is demanding in terms of computations and development. In this work a machine learning method is presented based on convolutional neural networks that estimates the state-dependent forecast uncertainty represented by the forecast error covariance matrix using a single dynamical model integration. This is achieved by the use of a loss function that takes into account the fact that the forecast errors are heterodastic. The performance of this approach is examined within a hybrid data assimilation method that combines a Kalman-like analysis update and the machine learning based estimation of a state-dependent forecast error covariance matrix. Observing system simulation experiments are conducted using the Lorenz'96 model as a proof-of-concept. The promising results show that the machine learning method is able to predict precise values of the forecast covariance matrix in relatively high-dimensional states. Moreover, the hybrid data assimilation method shows similar performance to the ensemble Kalman filter outperforming it when the ensembles are relatively small.
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Europe > France (0.04)
- South America > Argentina > Mesopotamia > Corrientes Province > Corrientes (0.04)
- (2 more...)
- Research Report > New Finding (0.48)
- Instructional Material > Online (0.40)
Evaluation of Machine Learning Techniques for Forecast Uncertainty Quantification
Sacco, Maximiliano A., Ruiz, Juan J., Pulido, Manuel, Tandeo, Pierre
Producing an accurate weather forecast and a reliable quantification of its uncertainty is an open scientific challenge. Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty. The main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty, particularly those associated with model errors. In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input. We compare different training strategies: one based on a direct training using the mean and spread of an ensemble forecast as target, the other ones rely on an indirect training strategy using a deterministic forecast as target in which the uncertainty is implicitly learned from the data. For the last approach two alternative loss functions are proposed and evaluated, one based on the data observation likelihood and the other one based on a local estimation of the error. The performance of the networks is examined at different lead times and in scenarios with and without model errors. Experiments using the Lorenz'96 model show that the ANNs are able to emulate some of the properties of ensemble forecasts like the filtering of the most unpredictable modes and a state-dependent quantification of the forecast uncertainty. Moreover, ANNs provide a reliable estimation of the forecast uncertainty in the presence of model error.
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- South America > Argentina > Mesopotamia > Corrientes Province > Corrientes (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)